-
Notifications
You must be signed in to change notification settings - Fork 471
Refactor jsx mode in parser #7751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| LessThan -> | ||
(* Imagine: <div> <Navbar /> < | ||
* is `<` the start of a jsx-child? <div … | ||
* or is it the start of a closing tag? </div> | ||
* reconsiderLessThan peeks at the next token and | ||
* determines the correct token to disambiguate *) | ||
let token = Scanner.reconsider_less_than p.scanner in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what bother me a bit, there is LessThanSlash
above, yet we still need to do the reconsider_less_than
call.
let attr_expr = parse_primary_expr ~operand:(parse_atomic_expr p) p in | ||
Some (Parsetree.JSXPropValue ({txt = name; loc}, optional, attr_expr)) | ||
| _ -> Some (Parsetree.JSXPropPunning (false, {txt = name; loc}))) | ||
(* {...props} *) | ||
| Lbrace -> ( | ||
Scanner.pop_mode p.scanner Jsx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rather confusing when you are in a nested jsx scenario:
<div>
<p>
{foo}
</p>
</div>
Popping Jsx
from p
requires the pop of div
also to happen to get out of Jsx
mode.
posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->32:10] | ||
JSX <di:[30:10->30:12] div[32:6->32:9]=...[32:6->32:9]> _children:None | ||
posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->30:12] | ||
JSX <di:[30:10->30:12] > _children:None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes is because there is slightly different AST for:
<div>
<di
</div>
It used to be
[
structure_item (A.res[1,0+0]..[3,12+6])
Pstr_eval
expression (A.res[1,0+0]..[3,12+6])
Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4])
jsx_props =
[]
> [1,0+4]
jsx_children =
[
expression (A.res[2,6+2]..[3,12+6])
Pexp_jsx_container_element "di" (A.res[2,6+3]..[2,6+5])
jsx_props =
[
div ]
> [3,12+5]
jsx_children =
[]
]
]
and now is
[
structure_item (A.res[1,0+0]..[3,12+6])
Pstr_eval
expression (A.res[1,0+0]..[3,12+6])
Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4])
jsx_props =
[]
> [1,0+4]
jsx_children =
[
expression (A.res[2,6+2]..[2,6+5])
Pexp_jsx_unary_element "di" (A.res[2,6+3]..[2,6+5])
jsx_props =
[]
]
]
I think this is more correct. The unary (new) versus container (old) doesn't matter that much, neither can be determined for <di
.
However, the container had a weird prop div
, which it no longer has.
rescript
@rescript/darwin-arm64
@rescript/darwin-x64
@rescript/linux-arm64
@rescript/linux-x64
@rescript/win32-x64
commit: |
Hi @cristianoc, sorry for my eagerness, could you take a look at these changes? |
Are there differences in white space behavior? Are these intended? (For composite tokens)? Or possible ambiguities when single characters are tokenised in isolation. |
No, actually not, there is no other way the language can encounter Safe travels, will ask someone else for a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the JSX mode handling in the ReScript parser by removing the convoluted JSX mode mechanism from the scanner and moving JSX-specific logic to the parser. The refactoring streamlines JSX parsing by eliminating the need for Scanner.set_jsx_mode
and Scanner.pop_mode
calls throughout the codebase, and introduces lookahead functionality for better JSX token handling.
Key changes:
- Removes JSX mode from the scanner and replaces it with parser-level JSX identifier handling
- Introduces lookahead functions (
peekMinus
,peekSlash
) for better JSX token disambiguation - Adds a token debugger tool for development purposes
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
compiler/syntax/src/res_scanner.ml |
Removes JSX mode handling and adds lookahead functions for minus and slash characters |
compiler/syntax/src/res_core.ml |
Implements JSX identifier parsing in the parser with lookahead-based logic |
compiler/syntax/src/res_token.ml |
Removes LessThanSlash token type |
compiler/syntax/src/res_token_debugger.ml |
Adds new token debugging utility |
tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt |
Updates expected error message |
Various package.json files | Reverts to legacy clean command for analysis tests |
p.token <- token | ||
| Uident txt when Scanner.peekMinus p.scanner -> | ||
let buffer = Buffer.create (String.length txt) in | ||
Buffer.add_string buffer txt; | ||
Parser.next p; | ||
let name = visit buffer |> Buffer.contents in | ||
let token = Token.Uident name in | ||
p.token <- token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Direct mutation of parser state (p.token <-
) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.
p.token <- token | |
| Uident txt when Scanner.peekMinus p.scanner -> | |
let buffer = Buffer.create (String.length txt) in | |
Buffer.add_string buffer txt; | |
Parser.next p; | |
let name = visit buffer |> Buffer.contents in | |
let token = Token.Uident name in | |
p.token <- token | |
set_token p token | |
| Uident txt when Scanner.peekMinus p.scanner -> | |
let buffer = Buffer.create (String.length txt) in | |
Buffer.add_string buffer txt; | |
Parser.next p; | |
let name = visit buffer |> Buffer.contents in | |
let token = Token.Uident name in | |
set_token p token |
Copilot uses AI. Check for mistakes.
p.token <- token | ||
| Uident txt when Scanner.peekMinus p.scanner -> | ||
let buffer = Buffer.create (String.length txt) in | ||
Buffer.add_string buffer txt; | ||
Parser.next p; | ||
let name = visit buffer |> Buffer.contents in | ||
let token = Token.Uident name in | ||
p.token <- token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Direct mutation of parser state (p.token <-
) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.
p.token <- token | |
| Uident txt when Scanner.peekMinus p.scanner -> | |
let buffer = Buffer.create (String.length txt) in | |
Buffer.add_string buffer txt; | |
Parser.next p; | |
let name = visit buffer |> Buffer.contents in | |
let token = Token.Uident name in | |
p.token <- token | |
set_token p token | |
| Uident txt when Scanner.peekMinus p.scanner -> | |
let buffer = Buffer.create (String.length txt) in | |
Buffer.add_string buffer txt; | |
Parser.next p; | |
let name = visit buffer |> Buffer.contents in | |
let token = Token.Uident name in | |
set_token p token |
Copilot uses AI. Check for mistakes.
tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt
Outdated
Show resolved
Hide resolved
Hitting AI with AI https://chatgpt.com/share/68933fb4-cef4-8011-a04c-a30808a82b89 Jump to the end for the relevant summary |
I was experimenting with the parser related to JSX and noticed that we have a somewhat convoluted mechanism for handling
/>
and-
in identifier names.First, I created a token dump tool in
res_parser
, which was previously missing.Currently, the parser employs a sequence of
Scanner.set_jsx_mode p.Parser.scanner;
andScanner.pop_mode p.scanner Jsx;
while processing elements to distinguish between parser JSX and non-JSX. However, this is mainly used in the following cases:-
inside an identifier. This logic belongs in the parser, but it currently resides in the scanner, which feels inappropriate.<
+/
into a</
token. I would prefer using a lookahead for when a<
is encountered. This would clarify that it's specific to JSX parsing. Even though LessThanSlash exists, there is still a separate LessThan + Slash check, which makes the code a bit messy.This is an effort to streamline the JSX mode.
PS: to run the local analysis tests, I had to revert to
legacy clean
. This is for the better until we figure out #7707